<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
  <title>Chemistry Functionality</title>
  <base href="../">
  <link type="text/css" href="styles.css" rel="stylesheet">
</head>

<body>
<hr align="left" size="2" width="550">
<h2><a name="ChemicalStructures"></a>Working With Chemical Structures</h2>

<hr align="left" size="2" width="550">
<p><span class="keyword">DataWarrior</span> was designed from the outset as a chemistry aware
data analysis and visualization platform. Its built-in chemistry intelligence allows working
with chemical structures as easily as with alphanumerical data. Rows may be filtered based on
whether molecules contain certain sub-structures or on various kinds of molecule similarities.
Views may display molecular structures as labels or on the axes. Various kinds of molecule
similarities can be translated into marker positions, sizes or colors.
Data analysis methods like the principal component analysis or self organizing maps can be
applied to chemical structures equally well as to alphanumerical data.</p>
<p>Dedicated cheminformatics functionality provide for state-of-the-art analysis methods from the
mere similarity comparison of two molecule lists to more advanced methods as an activity cliff
analysis, SAR-table creation or the enumeration of virtual combinatorial substance libraries.
2-dimensional scaling methods help to visualize compound collections and the prediction of
many compound properties to support the characterization and selection of compounds.</p>
<br>

<h3><a name="StructureFilter"></a>The Structure Filter</h3>

<table class="wspace">
 <tr>
  <td class="wspace"><img src="help/img/chem/ssSearch.jpeg"></td>
  <td class="wspace">If a <i>DataWarrior</i> file contains chemical structures, then usually
the filter panel contains one or some <span class="keyword">Structure Filters</span>. In any
case one can always add new <span class="keyword">Structure Filters</span> by selecting
<span class="menu">New Filter...</span> from the <span class="menu">Edit</span> menu.</td>
  </tr>
</table>

<br>
There are three kinds of options within a <span class="keyword">Structure Filter</span>:<br>

<li><span class="menu">contains</span> hides compounds (not) having a
certain a sub-structure</li>

<li><span class="menu">is similar to [...]</span> hides rows with
(dis-)similar structures. The descriptor used is given in brackets.</li>

<li>The <span class="menu">&lt;disabled&gt;</span> option disables the filter
but keeps the query molecule for later filtering.</li>
<br>

<p>A double-click on the filter's structure field opens the <span class="keyword">Osiris
Structure Editor</span>. Please note that the behavior of the editor differs between the
sub-structure search and the similarity search. In the former case you edit a potentially
substituted chemical fragment that may contain query features while in
the latter case you define a complete molecule.</p>

<p align="center"><img src="help/img/chem/editor.jpeg"></p>
<p><center><i>Structure editor in molecule mode.</i></center></p>

<p>To edit a structure one typically selects a tool from the buttons on the left and
uses the mouse to apply the tool in the right structure area. The keyboard can be used
to accelerate the drawing process, e.g. typing a digit adds a chain of n atoms or changes
a bond order depending on whether the mouse pointer is on top of an atom or a bond.
'+' or '-' change an atom charge and <i>Del</i> removes selected atoms.
Typing one or more letters changes an atom type. More options to modify an existing
atom are available in the atom dialog, which opens after a double click on an atom.
A single click just applies the defined options to the clicked atom.</p>

<p align="center"><img src="help/img/chem/atomDialog.jpeg"></p>
<p><center><i>Atom dialog to define isotops, abnormal valences and custom atom labels.</i></center></p>

Most of the structure editor's buttons are rather self explanatory, while some
need a little more explanation. For a detailed description of the editor please
refer to the <a href="help/../editor/editor.html">Structure Editor</a> section.</p>
<br>

<h3>Substructure Search</h3>

<p>A filter in substructure search mode hides all rows, whose structures do not contain
the drawn molecule fragment.
Atoms or bonds of the query fragment can be specified more precisely to narrow
the search. A double click on an atom or bond after selecting the lasso tool opens
an atom or bond query feature dialog. If the double clicked atom or bond is one of
multiple selected ones, then the dialog's settings are applied to all selected atoms
or bonds when the dialog is closed with <span class="menu">OK</span>.</p>

<p align="center"><img src="help/img/chem/atomQF.jpeg"><br></p>
<p><center><i>Atom query feature dialog of structure editor in fragment mode.</i></center></p>

<p align="left">The first option in the atom query feature dialog lets you convert
a defined atom into a wild card. In the next <span class="menu">allowed atoms</span>
field one may specify more precisely, which atom are allowed atom this position.
If <span class="menu">any atomic number</span> is selected and the respective atom is
declared as wild card then the <span class="menu">allowed atoms</span> field turns
into an <span class="menu">excluded atoms</span> field, accepting any atom the explicitly
specified ones. While these options make the query fragment less specific, all following
options narrow the search. These options are self explanatory.</p>

<p align="center"><img src="help/img/chem/query.jpeg"></p>
<p><center><i>Query fragment with yellow markers to indicated not visible query features.</i></center></p>

<p>In the <span class="keyword">Structure Editor</span> atom or bond query features are
usually reflected in the drawing in an obvious way or indicated with small letters.
More complex query features may not be shown directly. In this case affected atoms or
bonds are indicated in yellow for easy recognition.</p>

<p align="center"><img src="help/img/chem/bondQF.jpeg"></p>
<p><center><i>Bond query feature dialog of structure editor in fragment mode.</i></center></p>

<p>In the bond query features dialog one may define more than one bond types to satisfy the search.
The <span class="keyword">Single</span> or <span class="keyword">Double</span> bond options
match any non-delocalized single or double bonds, respectively. <span class="keyword">DataWarrior</span>
considers 5-membered aromatic rings as non-delocalized. Therefore, the
<span class="keyword">Delocalized</span> option does not match to any bond of an aromatic
5-membered ring, unless it is annelated to an aromatic 6-membered ring, which is considered
delocalized and which causes the shared bond between the two rings to be delocalized as well.
In general <span class="keyword">DataWarrior</span> considers aromatic bonds with no
preferred mesomeric structure being delocalized.</p>

<p>A query bond may also be redefined to be a chain of multiple connected atoms
rather than a directly connecting bond. The allowed range for the chain length
dose not include the end atoms already drawn as part of the query structure.</p>

<p>If <span class="menu">Highlight Structure By -> Recent Filter</span> is selected in column
header menu of the structure column in the <span class="keyword">Table View</span>, then
the query fragments as part of the entire molecule are drawn in dark red, while the rest
of the displayed structures is usually drawn in black.</p>
<br>

<h3>Similarity search</h3>
  
<p>For focusing on structurally similarity compounds rather than on compounds sharing a specific
sub-structure, select any of the <span class="menu">is similar to [...]</span> options. Then
double-click the filter's structure area and draw a molecule using the
<span class="keyword">Structure Editor</span> or drag and drop a molecule from somewhere else to the filter.
You may adjust the similarity level to your needs. Typically, chemists will perceive molecules
as very similar if their similarity value is about 0.90 or above.</p>

<p align="center"><img src="help/img/chem/simSearch.jpeg"></p>

<p>If <span class="menu">Highlight Structure By -> Recent Filter</span> is selected in column
header menu of the structure column in the <span class="keyword">Table View</span>, then structural
elements that are shared between displayed structures and the query structure are highlighted in green.</p>

<p>Per default Datawarrior calculates a <span class="keyword">FragFp descriptor</span>
of the first structure column within the data table. This descriptor can be used to
calculate similarities between molecules. The <span class="keyword">FragFp</span> similarity
between two molecules is the number of fragments that both molecules have in common
devided by the number of fragments being found in any of the two molecules.
If your dataset contains other descriptors, then your filter menu contains associated
<span class="menu">is similar to [...]</span> options that you may choose to filter
by another similarity criterion.
Other descriptors can be calculated by choosing <span class="menu">Chemistry->Add Descriptor->...</span>
from the menu. Once the calculation has been finished, the associated similarity option
gets available in the filter menu.
Here you can find more information on <a href="help/similarity.html">descriptors and similarities</a>
and <a href="help/similarity.html#WhichDescriptor">which kind of similarity</a> should be used for which purpose.</p>
<br>

<h3><a name="CompareFiles"></a>Comparing Structure Files</h3>

<p>Sometimes the need arises to compare two rather big sets of compounds for overlaps, i.e.
for compounds in one set, which are matched by very similar compounds in the other set.
Typically, this task is due whenever new compounds shall be purchased from some provider,
which should be substantially different to the compounds one already owns. 
</p>

<p align="center"><img src="help/img/chem/compareFiles.jpeg"></p>
<br>


<h3><a name="SelectDiverse"></a>Selecting Diverse Compounds from Large Set</h3>
<p>
This function is an efficient implementation for locating a most diverse subset
within a given set of molecules. The algorithm can be preloaded with a second
set of molecules, causing the algorithm to select molecules, which are both, most
different to any molecule in the secons set and highly diverse among the selection.
Especially for this reason, this function is perfectly suited to select diverse
screening compounds from a provider's catalog avoiding any compound being similar
to already available in-house compounds.</p>
<p align="center"><img src="help/img/chem/selectDiverseDialog.jpeg"></p>
<p><center><i>Dialog configured to select 50000 diverse compound different to currentLibrary.dwar.</i></center></p>
<p>All binary descriptors can be used with this algorithm. After computing the
desired number of diverse compounds a column is added to the dataset with ascending
numbers indicating selected compounds. The compound with number 1 is that compounds,
which is most different to all the others. Compound number 2 is most different from
number 1. Compound 3 is the one most different to 1 and 2 and so forth. If a dataset
contains a few awkward compounds, then these are likely to be picked first. Therefore,
in reality one would often skip the very first compounds of the diverse selection.</p>
<br>

<h3><a name="ClusterCompounds"></a>Clustering Compounds</h3>
<p>Clustering is an old cheminformatics technique for subdividing a typically large
compound collection into small groups of similar compounds. Clustering was used in
the old days, when computational resources were expensive, to precompute similarity
relationships between compounds. Cluster membership could be stored easily in databases
to be quickly retrieved later, whenever the need arose to locate similar structures
to any given structure, e.g. after a high-throughput screening. The inherent problem
of clustering is that cluster borders are arbitrary and may separate very similar
compounds into different clusters. Therefore, the retrieval of all cluster co-member
of a given compound does not necessarily result in the most similar compounds.</p>

<p>The cluster algorithm implemented in <i>DataWarrior</i> is simple, reproducible,
but computationally demanding and, therefore, best used if the dataset doesn't
contain far beyond 10000 compounds. First the complete similarity matrix is calculated,
which can be done with any descriptor. Then, in a stepwise process the most similar
compounds or clusters are merged to form a new cluster, whose similarity to the
remaining compounds and clusters is re-calculated as a weighted mean from its members.
The merging process continues until a stop criterion is met. Stop criteria can be
defined in the cluster dialog.</p>

<p align="center"><img src="help/img/chem/clusterDialog.jpeg"></p>

<p>The clustering process may be defined to stop when the cluster count reaches
a desired number or when the similarity needed to join two clusters falls below
a definable limit. If both criteria are defined, then the clustering stops if any
of both criteria are met.</p>
<br>

<h3><a name="MolecularProperties"></a>Calculating Molecular Properties</h3>

<p><i>DataWarrior</i> is able to calculate certain physico-chemical properties,
lead- or drug-likeness related parameters, ligand efficiencies, various atom and
ring counts, molecular shape, flexibility and complexity as well as indications
for potential toxicity. After calculating properties, these are automatically
added as new columns to the data table.</p>
<p>To calculate any molecular properties from chemical structures select
<span class="menu">Add Compound Properties...</span> from the <span class="menu">
Chemistry</span> menu. Select the properties of interest from one or more
property sections and click <span class="menu">OK</span>.</p>

<p align="center"><img src="help/img/chem/properties.jpeg"></p>

<p>Properties related to the ligand efficiency are based on IC50 values and require
the selection of a corresponding numerical column that contains IC50-values.</p>
<p>Some properties match those available in the <i>OSIRIS Property Explorer</i>,
which was made public in 2000 and in now maintained on www.openmolecules.org.
These properties and the algorithms used are explained in more detail
<a href="help/../properties/properties.html">here</a>.</p> 
<br>

<h3><a name="VirtualLibraries"></a>Enumerating Virtual Libraries</h3>

<p>DataWarrior can generate all structures of a virtual combinatorial library
if a generic reaction is drawn and for every generic reactant a list of real reactant
structures is provided. The enumerated product structures could be used to
predict physico-chemical properties and to select those products with the most
promising properties for synthesis or to be purchased.</p>

<p>To create virtual libraries, select <span class="menu">Create Combinatorial Library...</span>
from the <span class="menu">Chemistry</span> menu. Then draw or
<span class="menu">Open</span> a generic reaction schema in the following dialog:</p>

<p align="center"><img src="help/img/chem/reactionEditor.jpeg"></p>

<ul>
<li>Map each atom involved in the reaction using the mapping tool.
<img src="help/../editor/mappingTool.gif"></li>

<li>You may save your generic reaction for later re-use</li>
<li>Click <span class="menu">OK</span> to switch from the reaction to the reactant dialog.</li>
</ul>

<p align="center"><img src="help/img/chem/reactantDialog.jpeg"></p>

<ul>
<li>Define all reactants by editing them, drag&drop, copy/paste or loading them from a file.</li>
<li>Click <span class="menu">OK</span> to start the enumeration.</li>
</ul>

<p>When creating the product structures, <i>DataWarrior</i> retains the atom coordinates
of the generic product. Therefore all products are later shown in the expected orientation.
After all product structures have been created, <i>DataWarrior</i> creates some default
views. Now you may continue by calculating physico-chemical properties for all virtual
products, cluster them or running some other kind of analysis.</p>
<br>

<h3><a name="EvolutionaryLibraries"></a>Generating Evolutionary Libraries</h3>

<p>Chemical space is huge. There are estimates that the number of distinct stable molecular
structures with a molecular weight in the drug-like range is about 10<sup>60</sup>.
It will never be possible to compute all these structures to search them for the one
with the most promising property profile for a particular purpose. Nevertheless,
navigating this vast space and locating unknown promising compounds can give new ideas.
</p>
<p>In cheminformatics sometimes a de-novo-design approach is taken to design new structures
from scratch with a high likelyhood of being active on a chosen target. Often this starts
with a small fragment, to which atoms or small fragments are added satisfying a ligand or
protein based fitness criterion.</p>
<p><i>DataWarrior</i> uses an evolutionary approach mimicking nature by randomly mutating
existing molecular structures with tiny changes to create new generations of potentially
better structures. Every generation of molecules is checked for fitness by a set of
customizable criteria and the most promising structures survive serve as starting points for the
next generation. The mutation algorithm performs changes like single atom replacements,
atom insertions, bond order changes, substituent migrations, ring aromatisations, etc.
For any structure being about to be mutated, all possible mutations are evaluated
regarding how much the the change increases or decreases the <i>drug-likeness</i>
(or optionally <i>natural-product-likeness</i>). Mutations with a change in the desires
direction are assigned a higher propability that mutations that decrease drug-
or natural-product-likeness. Mutations, which would create high ring strains are removed
from the list.</p>
<p>In the fitness panel of the <i>Evolutionaly Library Dialog</i> a desired compound
property portfolio can be defined. For simple properties one may define an optimal
numerical range. One may also require compounds to be similar or dissimilar to
a definable set of compounds using any descriptor. All individual fitness criteria
can be weighted to make them more or less important than others.</p>

<p align="center"><img src="help/img/chem/fitness.jpeg"></p>

<p>In the fitness example above we look for compounds, whose chemical structure is
dissimilar to any of three known inhibitors, while at the same time being similar
to at least one of these inhibitors considering flexophore similarity. In other
words, we are looking for compounds with a similar target binding behaviour, but
with a dissimilar chemical structure to the known inhibitors.</p>

<p align="center"><img src="help/img/chem/eLibrary.jpeg"></p>

<p>The picture above shows the <span class="menu">General</span> panel
of the dialog after starting the evolutionary process. As starting point the
structure of LDS has been selected, which is as good as any other starting point.
The type of compounds being created is set to <span class="menu">natural products</span>.
We see the structure of the currently mutated parent molecule, the molecules in the
current generation and the overall best ranking molecule. The background color
of these molecules reflect how well the fitness portfolio is already met.
Any time during the evolution process one may click <span class="menu">Stop</span>
to create a new document with the fittest structures of all generations.</p>
<br>

<h3><a name="SARTables"></a>Creating SAR-Tables</h3>

<p><i>Structure-Activity Relationship (SAR)</i> Tables are frequently used to
correlate biological properties with the substitution patterns of compound sets
that share one or a few chemical scaffolds and often have been synthesised
in a combinatorial fashion. If a dataset contains chemical structures,
<i>DataWarrior</i> may decompose the structures by analysing scaffold and substituents
and putting them into new dedicated columns.
This can be done either fully automatic or a little more flexible with some user guidance.
Similar functionality of this kind is sometime called <i>R-group-decomposition</i> or
<i>R-group-deconvolution</i>.</p>

<p>To automatically create a <span class="keyword">SAR-Table</span> from your dataset,
select <span class="menu">Automatic SAR-Analysis...</span> from the
<span class="menu">Chemistry</span> menu. A dialog lets you choose the mechanism
that is used to determine the scaffold.
<ul><li><span class="menu">Most central ring system</span>: As the name implies, with
this option that ring system of the molecule, which is closest to the topological center
of the molecule, is taken as the scaffold.</li>
<li><span class="menu">Murcko scaffold</span>: The <i>Murcko scaffold</i> of a molecule
is determined by locating all ring systems of the molecule and all direct connections
between them. Everything else is considered as substituents.</li>
</ul>
Compounds without any rings are not subjected to the analysis and their cells
in the new columns remain empty.</p>

<p>If you need more flexibility in determining, which sub-structures should be considered
the central scaffold, then you should select <span class="menu">Core based SAR-Analysis...</span>
from the <span class="menu">Chemistry</span> menu. The dialog lets you define a sub-structure
that may include query features. By using atom wild cards or variable bond bridges,
one drawn sub-structure may detect multiple different scaffolds at once. Nevertheless,
often the <span class="menu">Core based SAR-Analysis...</span> function needs to be
used multiple times to process all scaffolds in the dataset.</p>

<p>Whatever mode you use to generate a <span class="keyword">SAR-Table</span>,
for every different scaffold, the entire dataset is processed to find those scaffold atoms,
which have variations concerning their substituents they carry. For all of these, a
generic R-group is attached to the scaffold. If a scaffold atom is always carrying the
same substituent, this substituent is attached to the scaffold. The scaffold with attached
R-groups and substituents is called <span class="keyword">core-fragment</span> and
put into the first new table column. New R-group columns are added according to the needs.</p>

<p align="center"><img src="help/img/chem/SARResult.jpeg"></p>
<p><center><i>Table view with new columns added after SAR-Analysis</i></center></p>
<br>

<h3><a name="SimAnalysis"></a>Similarity Analysis</h3>

<p>In the recent literature<a href="help/chemistry.html#litSimAnalysis"><sup>1-3</sup></a> the terms <i>Molecular Similarity Analysis</i>, <i>Activity Cliff Analysis</i>
or <i>Activity Landscape</i> are hot topics. All these related methods have in common that they usually
start with a 2-dimensional scaling process of the chemical space, which means that all involved molecules
are positioned somehow on a 2D-area, such that similar molecules are located close to each other.
This scaling could be done by running a <i>principal component analysis (PCA)</i> on a descriptor of
the molecules and using the first two components as coordinates.
Another approach would be a <i>self organizing map (SOM)</i> from a descriptor. Both of these options are
limited in terms of the descriptor type, because they require input data to be vector, i.e. a binary or numerical
array of data.</p>
<p>While <span class="keyword">DataWarrior</span> allows running PCAs or SOMs on descriptor vectors and
visualizing the results as a chemical landscape, the <span class="keyword">Similarity Analysis</span>
is based on a different method. It uses a <span class="keyword">Rubberbanding Forcefield</span> approach,
which translates similarity better than a PCA, is faster than a SOM, uses the available space more efficiently
and works with any type of similarity criterion including the <span class="keyword">Flexophore descriptor</span>.</p>
<p>The approach involves the following steps:<ul>
<li>randomly position all molecules on the 2D space</li>
<li>calculate the entire similarity matrix between all molecules</li>
<li>locate most similar neighbors to be considered for every molecule</li>
<li>between any two neighbors assume attractive forces, which increase with similarity and distance</li>
<li>stepwise relocate all molecules parallel to the mean vector of perceived forces</li>
<li>while attractive forces decrease over time and due to lower distances, introduce increasing short range
repelling forces among all molecules</li>
</ul></p>
<p align="center"><img src="help/img/chem/orgFuncSimAnalysis.jpeg"></p>
<p><center><i>Three default views after similarity analysis</i></center></p>

<p>When DataWarrior has finished the calculation of molecule positions, it creates three new default views:
<ul><li>A view depicting the chemical space of all molecules. Similar neighbors are connected with a
connecting line and the markers that represent the molecules are colored dynamically by molecule similarity
to the chosen <span class="keyword">Current Molecule</span>, which changes whenever you click another marker.</li>
<li>A tree view that shows the direct neighbors of the chosen <span class="keyword">Current Molecule</span>.
When a marker or molecule is clicked on in any view, the <span class="keyword">Current Molecule</span>
changes and the tree view's content is dynamically updated to show the neighborhood of the new molecule.</li>
<li>A structure view, which is configured to show selected molecules on top, while the non-selected ones
are grayed out. The highlight mode of the respective structures column is set to
<span class="keyword">Current Row Similarity</span>, causing any displayed molecule to show any structural
differences to the molecule of the <span class="keyword">Current Row</span>. Structural elements possessed
by the reference molecule, which are not part of the depicted molecule, are shown in red. Structural elements
of the shown molecule, which are not present in the reference molecule, are highlighted with a blue background.
To change the selection of displayed molecules, you simply need to select different markers in the tree view
or on the similarity map.</li>
</ul></p>
<p>Since a <span class="keyword">Similarity Analysis</span> is very much related to a
<span class="keyword">Activity Cliff Analysis</span>, more information about how to configure and run
a similarity analysis can be found at the <a href="help/chemistry.html#SALIAnalysisConfig">end of the next section</a>.</p>
<p>
<a name="litSimAnalysis"></a>
1) Peltason L, Bajorath J; Molecular similarity analysis uncovers heterogeneous structure-activity relationships
and variable activity landscapes.; <i>Chem Biol.</i>, <b>2007</b>, <i>14</i> (5), pp 489-97<br>
2) Guha R, Van Drie J H; Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs;
<i>J. Chem. Inf. Model.</i>, <b>2008</b>, <i>48</i> (3), pp 646-658; DOI: 10.1021/ci7004093<br>
3) Bajorath J, Peltason L, Wawer M, Guha R, Lajiness M S, Van Drie J H; Navigating structure-activity landscapes;
<i>Drug Discovery Today</i>, <b>2009</b>, <i>14</i> (13-14), pp 698-705
</p><br>

<h3><a name="SaliAnalysis"></a>Activity Cliff Analysis</h3>
<p>The <span class="keyword">Activity Cliff Analysis</span> uses the same mechanism already explained
in the previous section to create a similarity map of all involved molecules. It also detects all
similarity relationships between them above an automatically determined similarity threshold. To be precise,
this is not a global cutoff value, but is modulated from molecule to molecule. Depending on the neighborhood
situation of an individual molecules the threshold may be increased or decreased to accound for many very
similar or few not even similar neighbors. This reduces singletons and untangles large clusters to some extend.</p>
<p>In addition to the <span class="keyword">Similarity Analysis</span> the so-called <span class="keyword">
Structure-Activity Landscape Index (SALI)</span> is calculated for all pairs of similar molecules.
If two molecules with measured activities a1 and a2 and their structural similarity being s,
then the SALI value between these molecules is defined as SALI = |a1-a2| / (1-s). The SALI value is a measure
of how much activity is gained (or lost) with a relatively small change in structure. Molecule pairs
that show an abrupt change in activity despite having a rather similar structure are called activity cliffs.
These pairs are particularly interesting, if one tries to understand structure-activity relationships
in order to design new structural motives with improved activities.</p>
<p align="center"><img src="help/img/chem/saliView.jpeg"></p>
<p>After an <span class="keyword">Activity Cliff Analysis</span> the generated similarity view
encodes SALI values and activites in marker size and marker color, respectively.
The image above shows a part of such a similarity map. In this case the dataset contained EC50 values
on Cannabinoid CB1 and CB2 receptors. The marker background color reflects the receptor subtypes
(CB1:pink, CB2: orange). One can easily recognize clusters of similar compounds, locate active compounds
(red markers), locate activity cliffs (large markers), and even distinguish CB1 from CB2 inhibitors.</p>
<br>
<h3><i><a name="SALIAnalysisConfig"></a>Configuring And Running A Similarity Or Activity Cliff Analysis</i></h3>
<p>
To perform a <span class="keyword">Similarity or Activity Cliff Analysis</span> choose
<span class="menu">Analyse Similarity/Activity Cliffs...</span> from the <span class="menu">Chemistry</span>
menu. The following dialog appears:</p>
<p align="center"><img src="help/img/chem/simAnalysisDialog.jpeg"></p>
<p><center><i>Similarity Analysis Dialog</i></center></p>
<p><b>Similarity on:</b> Defines the similarity criterion, i.e. the descriptor that is used for arranging
molecules on the 2D-map. One may use any descriptor that <span class="keyword">DataWarrior</span> knows of,
provided that is has been calculated previously for the current data file. Most useful descriptors are
<span class="keyword">SkelSpheres</span> for fine-grained chemical graph similarity,
<span class="keyword">OrgFunctions</span> for similarity on synthetically relevant organic functionality,
and <span class="keyword">Flexophore</span> to create a molecule map based on the similarity of protein binding
characteristics.</p>
<p><b>Activity column:</b> For a <span class="keyword">Similarity Analysis</span> don't select a column here.
For an <span class="keyword">Activity Cliff Analysis</span> you need to select that
column that contains the numerical value to calculate SALI values from. For any pair of molecules the SALI
value reflects how much activity is gained with a small change of the chemical structure. Very high SALI
values identify activity cliffs, i.e. those rare points in an activity landscape, where a small change of the
chemical structure causes a large change in activity (or any other experimentally determined molecule property.
Identifying these molecule pairs and understanding the structural cause of the activity change can be very
helpful in the process of designing compounds with better properties.</p>
<p><b>Identifier column:</b> The <span class="keyword">Similarity or Activity Cliff Analysis</span> detects
for evey molecule its most similar neighbor molecules and writes a reference to those molecules into a new column.
Therefore it needs a column that contains a key that uniquely identifies a molecule or data row. If your data
contains compound identifiers you may select that column. Otherwise, <span class="keyword">DataWarrior</span>
will create a new number for that purpose.</p>
<p><b>Separate groups by:</b> In some cases one column contains data experimental data refering to
multiple targets or measured under different conditions. If a second column contains categories describing
the conditions, and if only values within the same category can be compared to each other, then you should
select the category column here. Then SALI value will only be calculated from compatible experimental values.</p>
<p><b>Similarity limit:</b> Usually <span class="menu">Automatic</span> does a good job. However, if you
prefer getting more or less neighborship relationsips than the automatic process generates, then you may
disable the automatic setting and (moderately) update the threshold defining slider. If the limit is set too
high then this may cause the 2D-scaling find too little similarity relationships. The final map may then not
be much different from the initial state of randomly scattered molecules. If the limit is
set to low and therefore too many similarity pairs are found, then a highly interconnected bunch of molecules
won't equilibrate well.</p>
<p><b>Create view based on similarity relationships:</b> If this option is checked, a similarity map of all
molecules is created. Therefore a <span class="keyword">Rubberbanding Forcefield</span> is employed to
incrementally equilibrate 2D-coordinates for all molecules until an energy minimum is reached and all
molecules are positioned close to their most similar neighbors. Afterwards a 2D-view is created to visualize
the similarity map.</p>
<p><b>Create document of structure pairs:</b> If this option is checked, then
<span class="keyword">DataWarrior</span> creates a new document in an open window, which contains all
detected similarity relationships in dedicated rows. Two columns contain the two neighbor molecules;
additional columns contain molecule identifiers, similarity, activities, and SALI values.</p>
<br>
<p align="center">Continue with <a href="help/similarity.html">Molecule Similarities</a>...</p>
<br>


</body>
</html>
